Abstract

Detecting translated open reading frames (ORFs) within Ribo-Seq data often requires evaluating the reads within your captured ORFs to determine if typical properties of translation, such as triplet periodicity are present. inspectorORF aims to simplify the task of plotting ORFs, with the following features:

Standard workflow

Install Bioconductor dependencies -

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("plyranges", "bioseq", "Biostrings", "GenomicRanges", "rtracklayer")

Load the inspectorORF library -

library(inspectorORF)

inspectorORF requires genomic sequences to be in 2bit format. Your genome FASTA of interest can be converted to with UCSC’s TwoBit tool faToTwoBit as follows:

faToTwoBit gencode.v44.genome.fasta gencode.v44.genome.2bit

Additionally, coverage of the RNA-Seq data needs to be calculated. This can be performed with bedtools, using bedtools coverage:

bedtools coverage -abam sample_alignment.bam -b gencode.v44.annotation.gtf > control_rna_tracks.bed
gzip control_rna_tracks.bed # optional, but recommended, especially if dealing with large genomes.

A note if using RiboTaper data - RNA-Seq coverage is performed as part of the RiboTaper pipeline and therefore, the bedtools coverage step is not required.

Merging RNA-Seq and P-Site coverage

Before usage, reads from either ORFquant and bedtoolsbedtools or RiboTaper need to be imported and processed.

RiboTaper

As RiboTaper calculates both the RNA-Seq coverage and P-sites for you, simply using inspectorORF::import_tracks_from_RiboTaper() with one or more of the Psit_Ribo_Rna_Cent_tracks_* files found within the tracks_data folder of the RiboTaper results.

Please note, no example data is provided for running an example with RiboTaper data, and is simply provided to demonstrate how to create a tracks object from RiboTaper data. Additionally, inspectorORF::import_tracks_from_RiboTaper only supports one sample and therefore, multiple conditions, replicates, or additional data can not be supplied when importing from RiboTaper data. For plotting additional conditions or extra genomic data, the inspectorORF::merge_tracks_with_ORFquant function is required.

# No such RiboTaper data is provided for running as an
# example, all example data focuses on ORFquant since
# this is the successor to RiboTaper.
tracks <- inspectorORF::import_tracks_from_RiboTaper(
    c("Psit_Ribo_Rna_Cent_tracks_ccds", 
        "Psit_Ribo_Rna_Cent_tracks_nonccds")
)

ORFquant

Data from ORFquant can also be utilised. RNA-Seq coverage previously calculated with bedtools coverage and the P-sites calculated by ORFquant can be merged with the inspectorORF::merge_tracks_with_ORFquant function as follows, which will return a GRanges object.

All example data has been processed from GEO accession GSE16214, aligned to Gencode V44 and subset for the NTN4 gene.

tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
    rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
    orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF")
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Merging RNA-Seq reads with P-Sites"

If we evaluate the first 6 rows of the returned genome_tracks object, this shows the reads we have at each chromosome coordinate:

datatable(as.data.frame(tracks), options = list(pageLength = 10))

If you wish to add additional conditions, such as a treated condition, the extra and extra_orfquant_psites parameters can be utilised. These options expect a named list, where each element consists of the file name of the data to be added. Any names within the list can be used, as this information will be utilised to created an additional column name.

For example, if we have a treated condition consisting of RNA-Seq reads and calculated P-sites, the following example would create create tracks for multiple conditions:

tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
    rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
    orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF"),
    extra = list(treated_rna_reads = system.file("example_data", "treated_rna_tracks.bed.gz", package = "inspectorORF")),
    extra_orfquant_psites = list(treated_psites = system.file("example_data", "treated_psites_for_ORFquant", package = "inspectorORF"))
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Importing data for treated_rna_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Importing data for treated_psites"
#> [1] "Merging RNA-Seq reads with P-Sites"
# Optionally save the GRanges data into an RData file
save(tracks, file = "tracks.RData")

Addtional track information from the treated RNA-Seq and ORFquant data can be seen when we evaluate the tracks with head:

datatable(as.data.frame(tracks), options = list(pageLength = 10))

Obtaining genes or transcripts of interest

To plot reads from a transcript or ORF, reads associated with either genes(s) or transcript(s) of interest must be obtained. Gene level tracks requires additional steps for plotting but has the added benefit of being able to plot differing transcript variants of interest.

The example data provided consists of reads from RNA-Seq and P-Sites calculated from Ribo-Seq data (processed with ORFquant) for the NTN4 gene from undifferentiated (control) and differentiated (ra-treated) SH-SY5Y cells. To obtain gene level tracks for the human gene netrin 4 (NTN4, gene id ENSG00000074527.13), save the data as a bed file for future importing and obtain transcript level tracks for the transcript variant ENST00000343702.9 for plotting can be performed as follows:

# Obtain gene level tracks for the NTN4 gene - ENSG00000074527.13
gene_tracks <- inspectorORF::get_gene_tracks(
    tracks,
    gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
    genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
    gene_ids = c("ENSG00000074527.13"),
    framed_tracks = c("p_sites", "treated_psites") # required when multiple ORFquant data us provided
)

# Obtain tracks for specific transcript variants of the NTN4 gene.
tx_tracks_from_gene <- inspectorORF::gene_to_transcript_tracks(
    gene_tracks,
    transcript_filter = c("ENST00000343702.9") # multiple transcript ids supported
)
# Save the gene level track information as a bed file for future use.
inspectorORF::save_tracks(gene_tracks, "NTN4_gene_tracks.bed")

Alternatively, tracks for just the NTN4 transcript ENST00000343702.9 can be directly obtained and saved, requiring fewer steps but will need to be repeated if alternative transcripts of NTN4 wish to be plot in future.

tx_tracks <- inspectorORF::get_transcript_tracks(
    tracks,
    gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
    genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
    transcript_ids = c("ENST00000343702.9"), # multiple transcript ids supported
    framed_tracks = c("p_sites", "treated_psites") # required when multiple ORFquant data us provided
)
# Save the transcript level track information as a bed file for future use.
inspectorORF::save_tracks(tx_tracks, "NTN4_ENST00000343702_tracks.bed")

Plotting transcripts of interest

A simple transcript plot can be made by calling inspectorORF::transcript_plot on our tx_tracks object, requesting the transcript of interest

A plot of reads across a full transcript of interest can then be performed the inspectorORF::transcript_plot. The RNA-Seq coverage being peaks around the P-Site coverage rather than a more uniform coverage due to these reads being derived from polysome-associated RNA-Seq data.

inspectorORF::transcript_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9" # Requires 1 transcript at a time
)

or, plot a transcript from tracks created by gene information originally:

inspectorORF::transcript_plot(
    tx_tracks_from_gene,
    transcript_filter = "ENST00000343702.9" # Requires 1 transcript at a time
)

Exons making up a transcript can also be plot by splitting the figure into exons with the split_exons = TRUE option -

inspectorORF::transcript_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
    split_exons = TRUE
)

As can be seen in the above plot, although both RNA-Seq reads and P-Sites are found across the transcript, such reads are limited and fairly low, suggesting any coding sequences found within this transcript to not be translated. The original ORFquant results for this dataset also reflect this, with no coding sequence within ENST00000343702.9 being flagged as translated.

The annotated coding sequence within ENST00000343702.9 can be plot to get a better understanding the P-Site reads we do have, to see if they are primarily within the coding sequence. This can be undertaken by highlighting a region of interest by specifying a start_position and stop_position.

Please note, no checks will be performed if start_position and stop_position are true start and stop codons, or if they’re within the same frame. Plots will be given regardless.

inspectorORF::transcript_plot(
    tx_tracks,
    start_position = 456,
    stop_position = 2339,
    transcript_filter = "ENST00000343702.9" # Requires 1 transcript at a time
)

If RNA-Seq reads have a much higher depth to the Ribo-Seq/P-site information, such as with poorly translated lncRNAs, the RNA-Seq reads can be capped to the highest depth P-Site read by setting scale_to_psites = TRUE

inspectorORF::transcript_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T
)

Again, although reads are within the highlighted coding sequence, such reads are low, making it unlikely that this coding sequence is truly translated.

Evaluating P-sites and triplet periodicity of reads outside of the ORF compared to the triplet periodicity within the ORF (Full transcript P-sitesORF P-sites\text{Full transcript P-sites} - \text{ORF P-sites}) can be helpful to understand the likelihood the ORF is being translated. As such, if triplet periodicity is present within the ORF, with reads primarily consisting of the ORFs frame, but triplet periodicity is lacking within the remainder of the transcript, this can give some confidence in the ORF truly being translated.

An additional triplet periodicity summary plot for the remainder of the transcript can be enabled with the summarise_outside_region = TRUE:

inspectorORF::transcript_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    summarise_outside_region = T
)

Retaining reads within introns for plotting

During gene or transcript track generation, reads within introns can be retained with the keep_introns = TRUE to enable plotting intronic reads later -

my_tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
    rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
    orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF")
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Merging RNA-Seq reads with P-Sites"

gene_tracks <- inspectorORF::get_gene_tracks(
    my_tracks,
    gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
    genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
    keep_introns = TRUE,
    gene_ids = c("ENSG00000074527.13")
)

# Any track generation from gene tracks will therefore contain intronic reads
tx_from_gene_tracks <- inspectorORF::gene_to_transcript_tracks(
    gene_tracks,
    transcript_filter = c("ENST00000343702.9") # multiple transcript ids supported
)

# Or, with get_transcript_tracks directly

my_tx_tracks <- inspectorORF::get_transcript_tracks(
    my_tracks,
    gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
    genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
    transcript_ids = c("ENST00000343702.9"), # multiple transcript ids supported
    keep_introns = TRUE,
    framed_tracks = c("p_sites") # required when multiple ORFquant data us provided
)

Including and annotating introns within a plot can then be performed by setting the split_exons parameter of transcript_plot to “with_introns” -

inspectorORF::transcript_plot(
    my_tx_tracks,
    transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
    split_exons = "with_introns"
)

Plotting ORFs of interest

Finding ORFs

Helper functions are provided to search for ORFs and obtain the nucleotide sequence, amino acid sequence, and framing scores for each ORF frame.

These are not strictly required, as transcript_filter, start_position and optionally stop_position can be specified manually within the plotting functions.

If ORFs of interest aren’t known, or you wish to identify ORFs (simply a start and stop, these functions do not consider reads to detect translated ORFs. Tools exist for such analysis, such as ORFquant) to evaluate reads, we use the inspectorORF::find_orfs function to search for ORFs in a given transcript.

# Find ORFs in a specific transcript
orfs <- find_orfs(
    tx_tracks,
    transcript_filter = "ENST00000343702.9"
)

# Show the ORFs as a tibble, displaying start codon, stop codon, framing and reads present
datatable(as.data.frame(orfs), options = list(pageLength = 10))

Additional filter options can also be supplied to inspectorORF::find_orfs, such as:

  • frame_filter, to specify which frames to look in. Defaults to c(0, 1, 2)
  • start_threshold_filter, to specify how far into a transcript sequence to start searching. Defaults to 13
  • start_codon_filter, to specify the codon an ORF must start with. Defaults to c(“AUG”, “CUG”, “GUG”, “UUG”, “AAG”, “ACG”)
  • stop_codon_filter, to specify the codon an ORF must stop with. Defaults to c(“UAG”, “UGA”, “UAA”)
  • orf_length_filter, to specify the minimum nt length an ORF must be. Defaults to 3

ORF information

Additional helper functions exist to obtain both nucleotide sequences and amino acid residues of an ORF.

Nucleotide sequences are obtained with inspectorORF::get_orf_nt_seq, requiring the transcript of interest, and either an object from the list provided by inspectorORF::find_orfs, or a start_position parameter which will obtain the sequence up to the nearest in-frame stop codon. Optionally, a stop_poisition parameter can be supplied to override the detection of the nearest in-frame stop codon.

orf_nt_sequence <- inspectorORF::get_orf_nt_seq(
    tx_tracks,
    transcript_filter = "ENST00000343702.9", 
    start_position = 456
)

print(orf_nt_sequence)
#> RNAStringSet object of length 1:
#>     width seq                                               names               
#> [1]  1884 AUGGGGAGCUGCGCGCGGCUGCU...AUAUUUUAAAAAGAGAGUGCAAG ENST00000343702.9...

To obtain the amino acid residues of an ORF, the inspectorORF::get_orf_aa_seq function is utilised, accepting the same options as inspectorORF::get_orf_nt_seq.

orf_aa_sequence <- inspectorORF::get_orf_aa_seq(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456
)

print(orf_aa_sequence)
#> AAStringSet object of length 1:
#>     width seq                                               names               
#> [1]   628 MGSCARLLLLWGCTVVAAGLSGV...SFVQHWKPSLGRKVMDILKRECK ENST00000343702.9...

ORF plotting

Plotting just an ORF an interest can be performed with inspectorORF::orf_plot. This accepts similar options to inspectorORF::transcript_plot

Single conditions

Plotting ORFs can either be undertaken by supplying an object ORF position identified by inspectorORF::find_orfs,

inspectorORF::orf_plot(
    tx_tracks,
    orf_object = inspectorORF::get_orf(orfs, start_position = 456)
)

Or by supplying the transcript_id, start_position and stop_position positions directly.

Please note, no checks will be performed if orf_start and orf_stop are true start and stop codons, or if they’re within the same frame. Plots will be given regardless.

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9", 
    start_position = 456
)

Reads outside of the ORF of interest can also be included by setting plot_region = c(-1, -1) which will plot these regions as 5’ and 3’.

However, this can be particularly useful to plot a portion of the untranslated region upstream, downstream or both, of the ORF. this can be included in the plot with plot_region as specific sizes. These values are relative to the provided start and stop positions. E.G. plot_region = c(500, NA), plot_region = c(NA, 500), and plot_region = c(500, NA) will include 50nt upstream, 50nt downstream, and 50nt both upstream and downstream to the ORF requested.

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    plot_region = c(500, 500)
)

Multiple conditions

If the data was prepared with either multiple conditions or multiple replicates, inspectorORF supports plotting multiple figures of the same ORF for more than one condition. For example:

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    # Set two names for Control and Treated condition
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    # Declare names for the RNA and P-Site reads
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites"),
    # State the colours for the additional treated_rna_reads
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725")
)

This plot shows that coverage for both RNA-Seq and P-Sites are significantly higher and spread across the coding sequence of interested in our treated (differentiated) cells. Suggesting the coding sequence is translated.

This reflects what ORFquant reports too, with this coding region being flagged as translated -

seqnames start end width strand
ENST00000343702.9 456 2339 1884 -

Plotting additional information

Additional data

Additional data can be supplied at either the generation of genome tracks with additional information such as reads from long-read sequencing, or at the transcript level of track extraction with the additional_info option of inspectorORF::get_transcript_tracks such as calculated Kozak scores or transcript level mapped proteomics data.

# Create the main tracks information consisting of data prepared from long-read sequencing.
tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
    rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
    orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF"),
    extra = list(treated_rna_reads = system.file("example_data", "treated_rna_tracks.bed.gz", package = "inspectorORF"),
                # Control dRNA tracks not provided due to such reads being absent within the control cells.
                treated_dRNA_reads = system.file("example_data", "treated_dRNA_tracks.bed.gz", package = "inspectorORF")),
    extra_orfquant_psites = list(treated_psites = system.file("example_data", "treated_psites_for_ORFquant", package = "inspectorORF"))
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Importing data for treated_rna_reads"
#> [1] "Importing data for treated_dRNA_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Importing data for treated_psites"
#> [1] "Merging RNA-Seq reads with P-Sites"

Supplying additional transcript level information requires the data to be in the form of a data frame (or something which can be cohersed into a data frame) with the following layout.

transcript_id position custom_column
ENST1… 5 0.5
ENST1… 6 0.8
ENST2… 2 1.0

Multiple custom columns can be supplied and the values within are required to be numeric, leading to an additional bar graph for each custom column found.

For example, Kozak scores were previously calculated in-frame AUG codons for our transcript of interest (using a field labelled KSS) along with proteomics evidence (using a field labelled MS)

NTN4_KSS_and_MS <- read_csv(system.file("example_data", "NTN4_additional_tx_data.csv", package = "inspectorORF"))
#> Rows: 3882 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): transcript_id
#> dbl (3): position, KSS, MS
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
datatable(NTN4_KSS_and_MS, options = list(pageLength = 10))

Kozak scores can be calculated using ORFik or a webservice, for example https://www.tispredictor.com/kss. In the case of the proteomics data, we have set regions of a transcript with no proteomic evidence as zeros and regions of transcript where a peptide hit was present as values depending on how many overlapping hits are identified, which will result in solid blocks of colour for such regions.

Finally, the transcript tracks for the additional genomic and transcriptomic level data can be obtained and plotted.

# Obtain transcript tracks for ENST00000343702.9 along with our additional scores
tx_tracks <- inspectorORF::get_transcript_tracks(
    tracks,
    gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
    genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
    transcript_ids = c("ENST00000343702.9"),
    additional_info = NTN4_KSS_and_MS,
    framed_tracks = c("p_sites", "treated_psites")
)

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725")
)

Annotating codons

Codons can be labelled on the plot in order to evaluate alternative start codons that may be present within the transcript or ORF. This is particularly useful if there are upstream start codons which may act as an alternative initiation site, or for the analysis of near-cognate start codons (E.G CUG, or GUG).

Therefore, additional options are supplied within inspectorORF to annotate the following:

  • The ORFs start codon.
  • Nearby in-frame and out of frame alternative start codons.
  • Near-cognate start codons.
  • Downstream stop codons which may be utilised in the case of stop codon readthrough.

Annotating such codons is performed by supplying a list of queries with the codon_queries parameter of inspectorORF::orf_plot, specifying what codon(s) to plot, their framing, and colour.

E.G, to label the ORFs start codon and in frame only canonical (AUG) start codons:

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725"),
    codon_queries = list(
        inspectorORF::codon_query(annotate_start = T, colour = "black"),
        inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "darkred")
    )
)

Additionally, codons can be coloured based on the frame the codon is found in, by setting “use_framing” as the colour option. Allowing out of frame codons can be useful here too, although in this case, this results in a fairly noisy plot:

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725"),
    codon_queries = list(
        inspectorORF::codon_query(annotation_codons = c("AUG"), colour = "use_framing")
    )
)

As can be seen in the plot, there are many AUG start codons within the ORF, consisting of various frames. Such codons can also be plotted within the UTRs, to evaluate potential upstream alternative initiation sites. Or we can evaluate in-frame codons, across the whole transcript:

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    summarise_outside_orf = T,
    plot_region = c(-1, -1),
    text_size = 14,
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated\ndRNA-Seq\nReads",
                    "KSS" = "AUG Kozak\nscores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c(
        "rna_reads" = "grey70",
        "treated_rna_reads" = "grey60",
        "treated_dRNA_reads" = "#999900",
        "KSS" = "#008080",
        "MS" = "black",
        "0" = "#B12A90",
        "1" = "#0072B2",
        "2" = "#E66100"
    ),
    codon_queries = list(
        inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "use_framing")
    )
)

Two observations can be made from this plot. First of all, there are no upstream in-frame AUG start codons, suggesting initiation does occur at the start codon described for the coding seauence. Secondly, even if we evaluate alternative start codons, although two upstream in-frame CUG codons are present, there is zero evidence of these being utilised for initiation due to the lack of P-Site coverage downstream to these.

For example, to include in-frame upstream AUG and the near cognate codon CUG:

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    # Add 5'UTR region from start of transcript
    plot_region = c(-1, NA),
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725"),
    codon_queries = list(
        inspectorORF::codon_query(annotate_start = T, colour = "black"),
        inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "darkred"),
        inspectorORF::codon_query(annotation_codons = c("CUG"), in_frame = T, colour = "darkgreen")
    )
)

When annotating codons, stop codons can also be included with annotate_stop = TRUE. This can be particularly useful when annotating in-frame start codons across the transcript to indicate alternative ORFs which are possibly translated.

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    plot_region = c(-1, -1),
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725"),
    codon_queries = list(
        inspectorORF::codon_query(in_frame = T, annotate_stop = T, colour = "darkred")
    )
)

By default, annotate_stop = TRUE will search for all three possible stop codons (UAA, UAG and UGA). annotate_stop also accepts a character vector to search for specific stop codons instead -

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    plot_region = c(-1, -1),
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725"),
    codon_queries = list(
        inspectorORF::codon_query(in_frame = T, annotate_stop = c("UAG"), colour = "darkred")
    )
)

Annotated codons can also be plot on their on track facet, which is particularly useful in the case of lots of codons. This can be enabled by naming the codon query with annotation_label = "My Codons".

By default, all annotated codons will be within the same track facet titled ‘Labelled Codons’

inspectorORF::orf_plot(
    tx_tracks,
    transcript_filter = "ENST00000343702.9",
    start_position = 456,
    stop_position = 2339,
    scale_to_psites = T,
    # Add 5'UTR region from start of transcript
    plot_region = c(-1, NA),
    condition_names = c("rna_reads" = "Control",
                        "treated_rna_reads" = "Treated"),
    plot_read_pairs = c("p_sites" = "rna_reads",
                        "treated_psites" = "treated_rna_reads"),
    # Declare names for the RNA and P-Site reads, including the additional
    # dRNA_reads, KSS and MS data for them to be included on the plot
    dataset_names = c("rna_reads" = "RNA-Seq Reads",
                    "p_sites" = "P-Sites",
                    "treated_rna_reads" = "RNA-Seq Reads",
                    "treated_psites" = "P-Sites",
                    "treated_dRNA_reads" = "Treated dRNA-Seq Reads",
                    "KSS" = "AUG Kozak scores",
                    "MS" = "Proteomics"),
    # State the colours for the additional data
    plot_colours = c("rna_reads" = "grey60",
                    "treated_rna_reads" = "grey60",
                    "treated_dRNA_reads" = "orange",
                    "KSS" = "hotpink4",
                    "MS" = "black",
                    "0" = "#440854",
                    "1" = "#23A884",
                    "2" = "#FEE725"),
    codon_queries = list(
        inspectorORF::codon_query(annotate_start = T, colour = "black"),
        inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "darkred"),
        inspectorORF::codon_query(annotation_codons = c("CUG"), in_frame = T, colour = "darkgreen", annotation_label = "CUG Codons")
    )
)

Session Info

sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: x86_64-conda-linux-gnu
#> Running under: Ubuntu 25.04
#> 
#> Matrix products: default
#> BLAS/LAPACK: /home/aylz/remote/biotools/x86_64/conda/R/lib/libopenblasp-r0.3.30.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] DT_0.34.0            lubridate_1.9.4      forcats_1.0.0       
#>  [4] stringr_1.5.1        dplyr_1.1.4          purrr_1.1.0         
#>  [7] readr_2.1.5          tidyr_1.3.1          tibble_3.3.0        
#> [10] ggplot2_3.5.2        tidyverse_2.0.0      inspectorORF_1.1.0  
#> [13] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0  IRanges_2.40.0      
#> [16] S4Vectors_0.44.0     BiocGenerics_0.52.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.1            farver_2.1.2               
#>  [3] R.utils_2.13.0              Biostrings_2.74.0          
#>  [5] bitops_1.0-9                fastmap_1.2.0              
#>  [7] RCurl_1.98-1.17             GenomicAlignments_1.42.0   
#>  [9] XML_3.99-0.17               digest_0.6.37              
#> [11] timechange_0.3.0            lifecycle_1.0.4            
#> [13] plyranges_1.26.0            magrittr_2.0.3             
#> [15] compiler_4.4.3              rlang_1.1.6                
#> [17] sass_0.4.10                 tools_4.4.3                
#> [19] yaml_2.3.10                 data.table_1.17.8          
#> [21] rtracklayer_1.66.0          knitr_1.50                 
#> [23] labeling_0.4.3              htmlwidgets_1.6.4          
#> [25] S4Arrays_1.6.0              bit_4.6.0                  
#> [27] curl_7.0.0                  DelayedArray_0.32.0        
#> [29] RColorBrewer_1.1-3          abind_1.4-5                
#> [31] BiocParallel_1.40.0         withr_3.0.2                
#> [33] R.oo_1.27.1                 grid_4.4.3                 
#> [35] ggh4x_0.3.1                 scales_1.4.0               
#> [37] SummarizedExperiment_1.36.0 cli_3.6.5                  
#> [39] rmarkdown_2.30              crayon_1.5.3               
#> [41] generics_0.1.4              httr_1.4.7                 
#> [43] tzdb_0.5.0                  rjson_0.2.23               
#> [45] cachem_1.1.0                zlibbioc_1.52.0            
#> [47] parallel_4.4.3              XVector_0.46.0             
#> [49] restfulr_0.0.16             matrixStats_1.5.0          
#> [51] vctrs_0.6.5                 Matrix_1.7-3               
#> [53] jsonlite_2.0.0              hms_1.1.3                  
#> [55] patchwork_1.3.2             bit64_4.6.0-1              
#> [57] ggrepel_0.9.6               bioseq_0.1.5               
#> [59] crosstalk_1.2.2             jquerylib_0.1.4            
#> [61] glue_1.8.0                  minidown_0.4.0             
#> [63] codetools_0.2-20            stringi_1.8.7              
#> [65] gtable_0.3.6                BiocIO_1.16.0              
#> [67] UCSC.utils_1.2.0            pillar_1.11.0              
#> [69] htmltools_0.5.8.1           GenomeInfoDbData_1.2.13    
#> [71] R6_2.6.1                    vroom_1.6.5                
#> [73] evaluate_1.0.4              lattice_0.22-7             
#> [75] Biobase_2.66.0              R.methodsS3_1.8.2          
#> [77] Rsamtools_2.22.0            bslib_0.9.0                
#> [79] Rcpp_1.1.0                  SparseArray_1.6.0          
#> [81] xfun_0.53                   MatrixGenerics_1.18.0      
#> [83] pkgconfig_2.0.3